---
title: ScrapeGraph
description: ScrapeGraphTools enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content.
---

**ScrapeGraphTools** enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content using the ScrapeGraphAI API.

The toolkit provides 5 core capabilities:

1. **smartscraper**: Extract structured data using natural language prompts
2. **markdownify**: Convert web pages to markdown format  
3. **searchscraper**: Search the web and extract information
4. **crawl**: Crawl websites with structured data extraction
5. **scrape**: Get raw HTML content from websites *(NEW!)*

The scrape method is particularly useful when you need:
- Complete HTML source code
- Raw content for further processing
- HTML structure analysis
- Content that needs to be parsed differently

All methods support heavy JavaScript rendering when needed.


## Prerequisites

The following examples require the `scrapegraph-py` library.

```shell
pip install -U scrapegraph-py
```

Optionally, if your ScrapeGraph configuration or specific models require an API key, set the `SGAI_API_KEY` environment variable:

```shell
export SGAI_API_KEY="YOUR_SGAI_API_KEY"
```

## Example

The following agent will extract structured data from a website using the smartscraper tool:

```python cookbook/tools/scrapegraph_tools.py
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.scrapegraph import ScrapeGraphTools

agent_model = OpenAIChat(id="gpt-4.1")
scrapegraph_smartscraper = ScrapeGraphTools(enable_smartscraper=True)

agent = Agent(
    tools=[scrapegraph_smartscraper], model=agent_model, markdown=True, stream=True
)

agent.print_response("""
Use smartscraper to extract the following from https://www.wired.com/category/science/:
- News articles
- Headlines
- Images
- Links
- Author
""")
```

### Raw HTML Scraping

Get complete HTML content from websites for custom processing:

```python cookbook/tools/scrapegraph_tools.py
# Enable scrape method for raw HTML content
scrapegraph_scrape = ScrapeGraphTools(enable_scrape=True, enable_smartscraper=False)

scrape_agent = Agent(
    tools=[scrapegraph_scrape],
    model=agent_model,
    markdown=True,
    stream=True,
)

scrape_agent.print_response(
    "Use the scrape tool to get the complete raw HTML content from https://en.wikipedia.org/wiki/2025_FIFA_Club_World_Cup"
)
```

### All Functions with JavaScript Rendering

Enable all ScrapeGraph functions with heavy JavaScript support:

```python cookbook/tools/scrapegraph_tools.py
# Enable all ScrapeGraph functions
scrapegraph_all = Agent(
    tools=[
        ScrapeGraphTools(all=True, render_heavy_js=True)
    ],  # render_heavy_js=True scrapes all JavaScript
    model=agent_model,
    markdown=True,
    stream=True,
)

scrapegraph_all.print_response("""
Use any appropriate scraping method to extract comprehensive information from https://www.wired.com/category/science/:
- News articles and headlines
- Convert to markdown if needed  
- Search for specific information
""")
```

<Note>View the [Startup Analyst example](/examples/use-cases/agents/startup-analyst-agent) </Note>

## Toolkit Params

| Parameter                | Type           | Default | Description                                                                                               |
| ------------------------ | -------------- | ------- | --------------------------------------------------------------------------------------------------------- |
| `api_key`                | `Optional[str]`| `None`  | ScrapeGraph API key. If not provided, uses SGAI_API_KEY environment variable.                           |
| `enable_smartscraper`    | `bool`         | `True`  | Enable the smartscraper function for LLM-powered data extraction.                                       |
| `enable_markdownify`     | `bool`         | `False` | Enable the markdownify function for webpage to markdown conversion.                                      |
| `enable_crawl`           | `bool`         | `False` | Enable the crawl function for website crawling and data extraction.                                      |
| `enable_searchscraper`   | `bool`         | `False` | Enable the searchscraper function for web search and information extraction.                            |
| `enable_agentic_crawler` | `bool`         | `False` | Enable the agentic_crawler function for automated browser actions and AI extraction.                    |
| `enable_scrape`          | `bool`         | `False` | Enable the scrape function for retrieving raw HTML content from websites.                                |
| `render_heavy_js`        | `bool`         | `False` | Enable heavy JavaScript rendering for all scraping functions. Useful for SPAs and dynamic content.      |
| `all`                    | `bool`         | `False` | Enable all available functions. When True, all enable flags are ignored.                                 |

## Toolkit Functions

| Function           | Description                                                                                                                                    |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `smartscraper`     | Extract structured data from a webpage using LLM and natural language prompt. Parameters: url (str), prompt (str).                          |
| `markdownify`      | Convert a webpage to markdown format. Parameters: url (str).                                                                                  |
| `crawl`            | Crawl a website and extract structured data. Parameters: url (str), prompt (str), data_schema (dict), cache_website (bool), depth (int), max_pages (int), same_domain_only (bool), batch_size (int). |
| `searchscraper`    | Search the web and extract information. Parameters: user_prompt (str).                                                                        |
| `agentic_crawler`  | Perform automated browser actions with optional AI extraction. Parameters: url (str), steps (List[str]), use_session (bool), user_prompt (Optional[str]), output_schema (Optional[dict]), ai_extraction (bool). |
| `scrape`           | Get raw HTML content from a website. Useful for complete source code retrieval and custom processing. Parameters: website_url (str), headers (Optional[dict]). |

## Developer Resources

- View [Tools](https://github.com/agno-agi/agno/blob/main/libs/agno/agno/tools/scrapegraph.py)
- View [Cookbook](https://github.com/agno-agi/agno/tree/main/cookbook/tools/scrapegraph_tools.py)
- View [Tests](https://github.com/agno-agi/agno/blob/main/libs/agno/tests/unit/tools/test_scrapegraph.py) 